A dictionary to identify small molecules and drugs in free text
نویسندگان
چکیده
MOTIVATION From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. RESULTS We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. AVAILABILITY The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.
منابع مشابه
A High Capacity Email Steganography Scheme using Dictionary
The main objective of steganography is to conceal a secret message within a cover-media in such a way that only the original receiver can discern the presence of the hidden message. The cover-media can be a text, email, audio, image, and video, which can be transmitted through a public channel, such as the Internet. By extending the use of email among Internet users, the provision of email steg...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملMolecular Docking Based on Virtual Screening, Molecular Dynamics and Atoms in Molecules Studies to Identify the Potential Human Epidermal Receptor 2 Intracellular Domain Inhibitors
Human epidermal growth factor receptor 2 (HER2) is a member of the epidermal growth factor receptor family having tyrosine kinase activity. Overexpression of HER2 usually causes malignant transformation of cells and is responsible for the breast cancer. In this work, the virtual screening, molecular docking, quantum mechanics and molecular dynamics methods were employed to study protein–ligand ...
متن کاملAutomatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
BACKGROUND Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be ...
متن کاملStudy and Recognition of Muslim Sage Abdullah Azdi and His Medical Dictionary Called “Kitāb Al-ma”
This study seeks to identify one of the pioneers of traditional clinical medicine named Abdullah Azdi and his medical dictionary. This research is an analytical study. The focus of the search was on two keywords, Abdullah Azdi and Kitab al-Ma'ma, but the scope of the search included all appropriate terms such as: medicine, Bu Ali Sina, traditional medicine, medical dictionary, ethics, and medic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 25 22 شماره
صفحات -
تاریخ انتشار 2009